Search CORE

20 research outputs found

Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors

Author: Kurpicz Florian
Publication venue: Springer International Publishing
Publication date: 28/11/2022
Field of study

Bit vectors are fundamental building blocks of succinct data structures used in compressed text indices, e.g., in the form of the wavelet trees. Here, two types of queries are of interest: rank and select queries. In practice, the smallest (uncompressed) rank and select data structure cs-poppy has a space overhead of ≈ 3.51 % [Zhou et al. SEA 2013] [26]. Using the same overhead, we present a data structure that can answer queries up to 8 % (rank) and 16.5 % (select) faster compared with cs-poppy

KITopen

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Author: Fischer Johannes
Kurpicz Florian
Köppl Dominik
Publication venue
Publication date: 01/01/2016
Field of study

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with

p

processors. Given a static text of length

n

, we first show how to compute the suffix array interval of a given pattern of length

m

O(\frac{m}{p}+ \lg p + \lg\lg p\cdot\lg\lg n)

time for

p \le m

. For approximate pattern matching with

k

differences or mismatches, we show how to compute all occurrences of a given pattern in

O(\frac{m^k\sigma^k}{p}\max\left(k,\lg\lg n\right)\!+\!(1+\frac{m}{p}) \lg p\cdot \lg\lg n + \text{occ})

time, where

\sigma

is the size of the alphabet and

p \le \sigma^k m^k

. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns

P

and

P'

, we present a data structure for computing the interval of

PP'

O(\lg\lg n)

sequential time, or in

O(1+\lg_p\lg n)

parallel time. All our data structures are of size

O(n)

bits (in addition to the suffix array)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Parallel External Memory Wavelet Tree and Wavelet Matrix Construction

Author: Ellert Jonas
Kurpicz Florian
Publication venue: Springer Nature Switzerland AG
Publication date: 20/08/2021
Field of study

KITopen

Faster Block Tree Construction

Author: Kurpicz Florian
Meyer Daniel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

The block tree [Belazzougui et al. J. Comput. Syst. Sci. \u2721] is a compressed text index that can answer access (extract a character at a position), rank (number of occurrences of a specified character in a prefix of the text), and select (size of smallest prefix such that a specified character has a specified rank) queries. It requires O(zlog(n/z)) words of space, where z is the number of Lempel-Ziv factors of the text. For some highly repetitive inputs, a block tree can require as little as 0.015 bits per character of the text. Small values of z make the block tree a space-efficient alternative to the wavelet tree, which is another index for these three types of queries. While wavelet trees can be constructed fast in practice, up so far compressed versions of the wavelet tree only leverage statistical compression, meaning that they are blind to spaced repetitions. To make block trees usable in practice, a first step is to find ways in constructing them efficiently. We address this problem by presenting a practically efficient construction algorithm for block trees, which is up to an order of magnitude faster than previous implementations. Additionally, we parallelize our implementation, making it the first block tree construction implementation that works in parallel in shared memory

Dagstuhl Research Online Publication Server

On Maximum Common Subgraph Problems in Series-Parallel Graphs

Author: Kriege Nils
Kurpicz Florian
Mutzel Petra
Publication venue: Springer International Publishing
Publication date: 20/08/2021
Field of study

KITopen

Scalable Construction of Text Indexes with Thrill

Author: Bingmann Timo
Gog Simon
Kurpicz Florian
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2018
Field of study

The suffix array is the key to efficient solutions for myriads of string processing problems in different application domains, like data compression, data mining, or bioinformatics. With the rapid growth of available data, suffix array construction algorithms have to be adapted to advanced computational models such as external memory and distributed computing. In this article, we present five suffix array construction algorithms utilizing the new algorithmic big data batch processing framework Thrill, which allows scalable processing of input sizes on distributed systems in orders of magnitude that have not been considered before

Crossref

KITopen

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Author: Fischer Johannes
Kurpicz Florian
Köppl Dominik
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH
Publication date: 20/08/2021
Field of study

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length n, we first show how to compute the suffix array interval of a given pattern of length m in O(m/p + lg p + lg lg p * lg lg n) time for p <= m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O((m^k sigma^k)/p max (k, lg lg n) + (1+m/p) lg p * lg lg n + occ} time, where sigma is the size of the alphabet and p <= sigma^k m^k. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P\u27, we present a data structure for computing the interval of PP\u27 in O(lg lg n) sequential time, or in O(1 + lg_p lg n) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array)

KITopen

On Maximum Common Subgraph Problems in Series-Parallel Graphs

Author: Kriege Nils
Kurpicz Florian
Mutzel Petra
Publication venue: Academic Press
Publication date: 20/08/2021
Field of study

KITopen